Skip to content

LCORE-836 spike: merge run.yaml into lightspeed-stack.yaml#1580

Merged
tisnik merged 16 commits into
lightspeed-core:mainfrom
max-svistunov:lcore-836-spike-llama-stack-config-merge
May 26, 2026
Merged

LCORE-836 spike: merge run.yaml into lightspeed-stack.yaml#1580
tisnik merged 16 commits into
lightspeed-core:mainfrom
max-svistunov:lcore-836-spike-llama-stack-config-merge

Conversation

@max-svistunov
Copy link
Copy Markdown
Contributor

@max-svistunov max-svistunov commented Apr 23, 2026

LCORE-836: merge run.yaml into lightspeed-stack.yaml (spike)

Spike deliverable for LCORE-836 — single-file operator-facing configuration for Lightspeed Core.

This branch is now docs-only. It has been rebased onto current upstream/main and the PoC code + tests removed (pre-merge cleanup per howto-run-a-spike step 10). What remains: the spike doc, the spec doc, and the PoC evidence under poc-results/. The PoC code was reviewed in this PR's earlier revisions.

What's in this PR

Design docs (docs/design/llama-stack-config-merge/):

  • llama-stack-config-merge-spike.md (use the "Outline" button) — the spike: research, design alternatives, PoC results, reviewer decisions (S1–S5, T1–T8), proposed JIRAs (3 Epics).
  • llama-stack-config-merge.md (use the "Outline" button) — the permanent feature spec (requirements, acceptance test surface, architecture, implementation suggestions).

PoC evidence (docs/design/llama-stack-config-merge/poc-results/):

  • Unified-mode lightspeed-stack.yaml used for validation, synthesized run.yaml, a real /v1/query response, and a README.
  • Pydantic AI ↔ Llama Stack research report (informs Decision S5; stays in-repo permanently).

Main findings

(The PoC code that produced these has been removed from the branch; the findings and the evidence under poc-results/ remain.)

  1. Option C + D works end-to-end. A unified lightspeed-stack.yaml (no external run.yaml) boots LCORE in library mode, serves /v1/query, and applies native_override correctly. 2098 unit tests passed including a lossless migrate-then-synthesize round-trip.
  2. Backward compat via shape detection — legacy library_client_config_path and the unified synthesis inputs are mutually exclusive and coexist through the deprecation window; the migration tool produces a lossless unified file from the legacy pair.
  3. Library client requires a file path, not a dict — confirmed against AsyncLlamaStackAsLibraryClient; library mode must write the synthesized file to disk.
  4. Default baseline inherits ${env.EXTERNAL_PROVIDERS_DIR} from the repo's run.yaml; implementation should slim the baseline or default the var. Flagged in the spike's "Findings discovered during PoC".
  5. High-level inference provider_id naming collision (sentence_transformers vs sentence-transformers) — per Decision S5 each backend-specific synthesizer translates LCORE's canonical vocabulary; implementation fixes the LS-side translation.
  6. Server-mode docker-compose e2e not re-validated — same synthesis code path as the library-mode PoC + unit tests; implementation JIRA re-runs it.
  7. Vacuous safety-shield validation in PoC — the PoC native_override registered llama-guard with provider_shield_id: openai/gpt-4o-mini (a chat model); implementation e2e must exercise a real Llama Guard model.

For reviewers

Strategic decisions — @sbunciak (PM) / @tisnik:

Technical decisions — @tisnik / team leads (T8 — @radofuchs):

Proposed JIRAs3 Epics + 9 children, grouped by aspect:

  • Unified-config implementation: schema + synthesizer, migration tool, LS entrypoint + deployment, deprecation WARN.
  • E2E and test-config coverage: e2e kickoff (feature files first, no impl), e2e/integration config migration, step definitions.
  • Documentation: docs migration to unified mode, reference profiles.

The decisions and proposed-JIRAs sections are where your input is needed; they link to background sections (current architecture, design alternatives, merge-semantics, mode detection) which are optional reference.

Pre-merge state

  • PoC code + tests: already removed (this branch was rebased onto upstream/main and the PoC commit dropped). The diff is docs-only.
  • poc-results/ evidence: retained for now. Whether to drop the library-mode/ evidence is the /spike-finalize decision (remove_poc_dir_before_merge = ask). The Pydantic AI research report stays (referenced from Decision S5).
  • Spike doc + spec doc: stay in the repo.
  • JIRAs: filed under LCORE-836 as 3 Epics + 9 children (dev-tools/file-jiras.sh), then the spike doc's LCORE-???? placeholders get replaced with the filed keys (/spike-finalize).

Tools used to create PR

  • Assisted-by: Claude Opus 4.7 (1M context)
  • Generated by: Claude Opus 4.7 (1M context)

Related Tickets & Documents

  • Related Issue # LCORE-777, LCORE-778, LCORE-781 (sibling / deferred work)
  • Related Issue # LCORE-509 (parent epic — streamline config)
  • Related Issue # LCORE-518 (prior spike — closed, groundwork)
  • Related Issue # LCORE-779 (prior auto-regeneration work — closed, groundwork)
  • Closes # LCORE-836

Checklist

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • PR has passed all pre-merge test jobs.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds design + PoC docs and implements unified-mode config synthesis: new llama_stack.config schema, synthesizer (baseline/profile/inference/native_override with deep-merge list-replace), dumb migration tool, library/client branching, CLI migrate workflow, default baseline, and tests/PoC artifacts.

Changes

Unified Configuration Synthesis and Integration

Layer / File(s) Summary
Design specification and architecture
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md, docs/design/llama-stack-config-merge/llama-stack-config-merge.md
Two design documents specify the unified config approach: spike covers technical decisions, PoC outcomes, and follow-on JIRAs; production design details schema, control-flow triggers, validation rules, merge semantics, migration paths, and error handling.
PoC artifacts and examples
docs/design/llama-stack-config-merge/poc-results/*
Library-mode PoC: README, synthesized-run.yaml, query-response.json, and a unified LCS PoC YAML demonstrating profile usage and native_override behavior.
Pydantic AI portability research
docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md
Research mapping Pydantic AI concepts to operator YAML, marking inference as portable and recommending backend-specific subtrees for rag/safety/tool-runtime.
Design schema and validation (docs → models)
docs/..., src/models/config.py, tests/unit/models/config/*
Adds UnifiedLlamaStackConfig, UnifiedInferenceSection, UnifiedInferenceProvider, extends LlamaStackConfiguration with optional config (mutually exclusive with library_client_config_path) and validation updates; tests updated to assert new model behaviors.
Configuration synthesis pipeline and migration
src/llama_stack_configuration.py, tests/unit/test_llama_stack_synthesize.py
Implements load_default_baseline, deep_merge_list_replace (list-replace semantics), apply_high_level_inference, synthesize_configuration, synthesize_to_file, and migrate_config_dumb; comprehensive unit tests for merge, inference expansion, profile loading, immutability, and dumb-mode round-trip.
Library client branching
src/client.py, tests/unit/test_client.py
Library-client initialization now branches on unified config vs legacy library_client_config_path; unified mode synthesizes run.yaml to a temp file for the library client; raises when neither is present.
CLI migrate workflow & runtime detection
src/lightspeed_stack.py, scripts/llama-stack-entrypoint.sh
Adds --migrate-config flags to run a dumb migration flow; main() auto-detects unified mode (presence of llama_stack.config) and bypasses legacy enrichment; entrypoint messaging updated to describe unified vs legacy modes.
Default baseline and runtime defaults
src/data/default_run.yaml, test.containerfile
New default run.yaml baseline enabling typical providers, storage, safety, and server defaults; containerfile includes src/data and adjusts ownership for runtime synthesis.
Tests and PoC validation
tests/unit/test_llama_stack_synthesize.py, tests/unit/models/config/*, docs/design/llama-stack-config-merge/poc-results/*
Unit tests added/updated to exercise deep-merge semantics, inference provider expansion, synthesis control flow, config validation, migration round-trip, and secret/env-ref preservation; PoC evidence included in docs.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Synthesizer as llama_stack_configuration.synthesize_configuration
  participant Filesystem as Disk
  participant Library as AsyncLlamaStackAsLibraryClient
  Client->>Synthesizer: pass `llama_stack.config`
  Synthesizer->>Filesystem: write synthesized run.yaml (synthesize_to_file)
  Filesystem->>Library: provide path to synthesized run.yaml
  Library->>Filesystem: read run.yaml and initialize client
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely describes the main objective: implementing a spike to merge run.yaml configuration into lightspeed-stack.yaml.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/lightspeed_stack.py (1)

117-165: ⚠️ Potential issue | 🟡 Minor

Stale main() docstring — references non-existent flag and omits the new --migrate-config flow.

The docstring still mentions --generate-llama-stack-configuration, which is not a CLI flag in this parser, and it does not describe the newly added --migrate-config / --run-yaml / --migrate-output branch that exits before load_configuration runs. Please update the docstring to match the actual behavior so the Raises: SystemExit paths and flag list stay accurate.

✏️ Proposed docstring fix
     - If --dump-schema is provided, writes the active configuration schema to
       schema.json and exits (exits with status 1 on failure).
-    - If --generate-llama-stack-configuration is provided, generates and stores
-      the Llama Stack configuration to the specified output file and exits
-      (exits with status 1 on failure).
+    - If --migrate-config is provided, migrates the legacy
+      (run.yaml + lightspeed-stack.yaml) setup into a unified single-file
+      configuration at --migrate-output and exits (status 1 on failure).
+      This branch bypasses configuration.load_configuration().
     - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts
       the quota scheduler, and starts the Uvicorn web service.

     Raises:
-        SystemExit: when configuration dumping or Llama Stack generation fails
-                    (exits with status 1).
+        SystemExit: when configuration dumping, schema dumping, or config
+                    migration fails (exits with status 1).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_stack.py` around lines 117 - 165, Update the main() docstring
to remove the obsolete reference to --generate-llama-stack-configuration, add
documentation for the new --migrate-config flow and its related flags
(--run-yaml and --migrate-output), and state that when args.migrate_config is
true the migration branch (handled by migrate_config_dumb) runs and exits before
load_configuration; also keep the note that failures in dumping/migration raise
SystemExit (status 1). Reference main(), args.migrate_config, and
migrate_config_dumb so the updated docstring accurately matches the implemented
control flow.
src/llama_stack_configuration.py (1)

918-931: ⚠️ Potential issue | 🔴 Critical

Separate raw and expanded config to prevent writing secrets to output file.

synthesize_to_file() is documented to preserve env-var references like ${env.FOO} verbatim in the output. The current code expands all environment variables on line 920 before passing to synthesize_to_file(), causing secrets to be written in plaintext to the generated run.yaml.

Use raw_config for synthesize_to_file() (preserves env refs as documented), and expanded_config for setup_azure_entra_id_token() (which requires actual credential values) and generate_configuration() (legacy mode enrichment):

Keep raw unified config separate from expanded legacy config
     with open(args.config, "r", encoding="utf-8") as f:
-        config = yaml.safe_load(f)
-        config = replace_env_vars(config)
+        raw_config = yaml.safe_load(f)
+        expanded_config = replace_env_vars(raw_config)
 
-    unified_present = (config.get("llama_stack") or {}).get("config") is not None
+    unified_present = (raw_config.get("llama_stack") or {}).get("config") is not None
     if unified_present:
         logger.info("Unified mode detected (llama_stack.config present)")
         # Azure Entra ID side-effect (writes .env) stays part of boot — still run it.
-        setup_azure_entra_id_token(config.get("azure_entra_id"), args.env_file)
+        setup_azure_entra_id_token(expanded_config.get("azure_entra_id"), args.env_file)
         synthesize_to_file(
-            config,
+            raw_config,
             args.output,
             config_file_dir=Path(args.config).resolve().parent,
         )
     else:
         logger.info("Legacy mode detected (no llama_stack.config)")
-        generate_configuration(args.input, args.output, config, args.env_file)
+        generate_configuration(args.input, args.output, expanded_config, args.env_file)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llama_stack_configuration.py` around lines 918 - 931, The code currently
calls replace_env_vars on the loaded config and then passes that expanded config
into synthesize_to_file, which will write expanded secrets into the output;
instead, keep two variables: raw_config = yaml.safe_load(...) (no
replace_env_vars) and expanded_config = replace_env_vars(raw_config); use
raw_config when calling synthesize_to_file(...) so env-var references remain
verbatim, and use expanded_config when calling setup_azure_entra_id_token(...)
and when invoking generate_configuration(...) or any legacy-mode enrichment that
needs real values; update references of config in this block to the appropriate
raw_config or expanded_config names accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md`:
- Around line 274-284: The markdown file contains fenced code blocks that lack
surrounding blank lines and trigger markdownlint MD031; for each fenced block
(e.g., the "Agentic tool instruction" block and the other blocks at the noted
ranges like 310-317, 343-349, 376-381, 411-416, 438-443, 463-468, 598-626) add
an empty line immediately before the opening ``` and an empty line immediately
after the closing ``` so every fenced code block is separated by blank lines
from surrounding text/content.
- Line 147: The in-page anchor in the link "See [Merge semantics worked
examples](`#merge-semantics-worked-examples`)." doesn't match the actual heading
"### Merge semantics — worked examples"; update either the link target or the
heading so they match under markdownlint rules — e.g., change the heading to use
a plain hyphen "### Merge semantics - worked examples" or change the link to the
exact slug produced from the em-dash (percent-encoded or the renderer-specific
slug), ensuring the text in the link target and the heading "Merge semantics —
worked examples" are identical.

In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md`:
- Around line 280-291: Update the doc to stop claiming synthesized files are
safe as world-readable and instead require restrictive file permissions and
recommend env-var references; specifically change wording around
native_override, apply_high_level_inference, replace_env_vars and run.yaml to
note that native_override and migrations may include literal secrets (e.g., API
keys), that apply_high_level_inference may not always emit only ${env.<VAR>} and
replace_env_vars is not a guarantee for all inputs, and mandate default
restrictive filesystem permissions and operator guidance to prefer env refs and
secrets management rather than declaring world-readable output acceptable.

In `@docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md`:
- Around line 3-8: The fenced bash code block starting with "```bash" in the
README.md is missing blank lines before and/or after (markdownlint MD031); add a
single blank line immediately above the opening ```bash line and a single blank
line immediately below the closing ``` line so the fenced block is separated
from surrounding text and satisfies MD031.

In
`@docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml`:
- Around line 107-110: The shields entry has an incorrect provider_shield_id
value: replace the OpenAI chat model id in the shields list (symbols: shields,
provider_id, provider_shield_id, shield_id) with the correct Llama Guard model
id or remove the mistaken native_override that injected it; specifically, ensure
provider_shield_id uses a guard model identifier such as
"meta-llama/Llama-Guard-3-8B" (or the intended guard model) and verify the
native_override/source that produced the OpenAI id is fixed so future artifacts
don’t carry an OpenAI model as a Llama Guard shield.

In
`@docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml`:
- Line 18: The committed YAML contains a machine-local absolute path in the
profile key ('profile:
/home/msvistun/repos/lightspeed/stack/tests/e2e/configs/run-ci.yaml'); remove or
replace this with a portable repo-relative path (for example
'./tests/e2e/configs/run-ci.yaml') or delete the profile entry if not needed so
the PoC artifact is reusable outside the author’s workstation; update any
references that expect the old absolute path (search for the profile key) to use
the new relative path.

In `@src/client.py`:
- Around line 104-108: The code currently writes the synthesized config to a
predictable temp path via the synthesized_path variable and open(...), which is
unsafe for concurrent processes and symlink attacks; change this to create a
secure, unique temp file (e.g. via tempfile.NamedTemporaryFile(delete=False) or
os.open with tempfile.mkstemp) and write ls_config to that securely opened file
using yaml.dump, then set restrictive permissions (chmod 0o600) on the new file
and close it before returning/using the path; reference the synthesized_path
creation, the open(...) write block, and the yaml.dump call when making the
change.

In `@src/data/default_run.yaml`:
- Line 155: Remove the trailing blank line at the end of the file to satisfy
YAMLlint's empty-lines rule: edit src/data/default_run.yaml (the file containing
default run config) and delete the final blank/empty line so the file ends with
the last YAML content line only (ensure there is no extra empty line after the
last document or mapping).
- Line 18: The baseline exposes external_providers_dir:
${env.EXTERNAL_PROVIDERS_DIR} without a safe fallback, causing configs to break
when the env var is absent; either provide a sensible default (e.g., set
external_providers_dir to a known safe path or an empty string/null) or remove
the key from the baseline so profiles supply it; update the default_run.yaml
entry for external_providers_dir to use a safe literal default value or delete
the line, and ensure any code reading external_providers_dir handles the chosen
default.

In `@src/lightspeed_stack.py`:
- Around line 150-165: The except block for the migrate_config_dumb call should
log the full traceback; replace the logger.error("Migration failed: %s", e) call
with logger.exception("Migration failed") so the stack trace is captured when
args.migrate_config triggers migrate_config_dumb (refer to migrate_config_dumb
and the surrounding try/except using logger).

In `@src/llama_stack_configuration.py`:
- Around line 825-829: The file write is using open(output_file, "w") which can
create world-readable files; change the write to create the file with explicit
restrictive permissions (mode 0o600). After ensuring the parent directory is
created (the existing Path(output_file).parent.mkdir call is fine), open the
file using a low-level create with os.open(...) with flags
O_WRONLY|O_CREAT|O_TRUNC and mode 0o600, wrap the returned fd with os.fdopen to
get a text file object, then use yaml.dump(ls_config, f, Dumper=YamlDumper,
default_flow_style=False) to write; reference the variables/functions ls_config,
output_file, synthesize_configuration, and YamlDumper so you update the existing
block that writes the synthesized configuration.

In `@tests/unit/test_client.py`:
- Around line 84-90: Replace the current test that relies on
AsyncLlamaStackClientHolder().load(...) to trigger a runtime ValueError with a
direct validation test against the Pydantic model: construct an invalid
LlamaStackConfiguration(...) instance (setting library_client_config_path=None
and leaving both unified and legacy unset) inside pytest.raises(ValueError,
match="neither .*unified.* nor .*legacy.* is set") so the
check_llama_stack_model validator on the LlamaStackConfiguration model is
exercised; this ensures validation fails at model construction rather than via
AsyncLlamaStackClientHolder.load or _load_library_client runtime checks.

In `@tests/unit/test_llama_stack_synthesize.py`:
- Around line 1-371: This PoC test suite should not be merged as-is; either
delete the whole test file (remove the new tests referencing
synthesize_configuration/migrate_config_dumb/apply_high_level_inference) or, if
you intentionally keep it, change tests to avoid relying on in-place mutation by
using the functional return value of apply_high_level_inference (don't assert
that the passed ls_config instance was mutated) and annotate the
MINIMAL_BASELINE constant with Final[dict[str, Any]] and update any tests that
mutate it to operate on a copy (e.g., use copy.deepcopy) so the baseline is not
altered in-place.

---

Outside diff comments:
In `@src/lightspeed_stack.py`:
- Around line 117-165: Update the main() docstring to remove the obsolete
reference to --generate-llama-stack-configuration, add documentation for the new
--migrate-config flow and its related flags (--run-yaml and --migrate-output),
and state that when args.migrate_config is true the migration branch (handled by
migrate_config_dumb) runs and exits before load_configuration; also keep the
note that failures in dumping/migration raise SystemExit (status 1). Reference
main(), args.migrate_config, and migrate_config_dumb so the updated docstring
accurately matches the implemented control flow.

In `@src/llama_stack_configuration.py`:
- Around line 918-931: The code currently calls replace_env_vars on the loaded
config and then passes that expanded config into synthesize_to_file, which will
write expanded secrets into the output; instead, keep two variables: raw_config
= yaml.safe_load(...) (no replace_env_vars) and expanded_config =
replace_env_vars(raw_config); use raw_config when calling
synthesize_to_file(...) so env-var references remain verbatim, and use
expanded_config when calling setup_azure_entra_id_token(...) and when invoking
generate_configuration(...) or any legacy-mode enrichment that needs real
values; update references of config in this block to the appropriate raw_config
or expanded_config names accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e3d88bb1-37ac-4c41-b1f8-3fd87572f9ef

📥 Commits

Reviewing files that changed from the base of the PR and between 6dddd89 and 06ef987.

📒 Files selected for processing (17)
  • docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md
  • docs/design/llama-stack-config-merge/llama-stack-config-merge.md
  • docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md
  • docs/design/llama-stack-config-merge/poc-evidence/library-mode/query-response.json
  • docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml
  • docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml
  • scripts/llama-stack-entrypoint.sh
  • src/client.py
  • src/data/default_run.yaml
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_llama_stack_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_llama_stack_synthesize.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • tests/unit/test_client.py
  • tests/unit/models/config/test_dump_configuration.py
  • src/lightspeed_stack.py
  • tests/unit/models/config/test_llama_stack_configuration.py
  • src/client.py
  • src/models/config.py
  • tests/unit/test_llama_stack_synthesize.py
  • src/llama_stack_configuration.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/test_client.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_llama_stack_configuration.py
  • tests/unit/test_llama_stack_synthesize.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/lightspeed_stack.py
  • src/client.py
  • src/models/config.py
  • src/llama_stack_configuration.py
src/**/config*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/config*.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic models
Use @field_validator and @model_validator for custom validation in Pydantic models
Use type hints like Optional[FilePath], PositiveInt, SecretStr in Pydantic models

Files:

  • src/models/config.py
🧠 Learnings (7)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Import Llama Stack client with: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • tests/unit/test_client.py
  • src/client.py
  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • docs/design/llama-stack-config-merge/poc-evidence/library-mode/synthesized-run.yaml
  • docs/design/llama-stack-config-merge/poc-evidence/lightspeed-stack-unified-library.yaml
  • src/data/default_run.yaml
  • src/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/client.py
📚 Learning: 2026-04-07T15:03:11.530Z
Learnt from: jrobertboos
Repo: lightspeed-core/lightspeed-stack PR: 1396
File: src/app/endpoints/conversations_v1.py:6-6
Timestamp: 2026-04-07T15:03:11.530Z
Learning: In the `llama_stack_api` package, all imports MUST use the flat form `from llama_stack_api import <symbol>`. Sub-module imports (e.g., `from llama_stack_api.common.errors import ConversationNotFoundError`) are explicitly NOT supported and considered a code smell, as stated in `llama_stack_api/__init__.py` lines 15-19. Do not flag or suggest changing root-package imports to sub-module imports for this package.

Applied to files:

  • src/client.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🪛 LanguageTool
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md

[style] ~451-~451: Try moving the adverb to make the sentence clearer.
Context: ... Link to the migration doc. Legacy mode continues to fully function. Scope: - Warning emission point: ...

(SPLIT_INFINITIVE)


[style] ~655-~655: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ncluding fields LCORE doesn't model. 3. Existing CI/CD templating that treats run.yaml...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~656-~656: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s run.yaml as a separate artifact. 4. Existing enrichment behavior (Azure Entra ID, BY...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~673-~673: To elevate your writing, try using more formal phrasing here.
Context: ...|---|---|---| | Do nothing | 0 | Legacy keeps working until deprecation window closes | | Lif...

(CONTINUE_TO_VB)

docs/design/llama-stack-config-merge/llama-stack-config-merge.md

[style] ~303-~303: To elevate your writing, try using more formal phrasing here.
Context: ...|---|---|---| | Do nothing | 0 | Legacy keeps working until deprecation closes | | Lift-and-s...

(CONTINUE_TO_VB)


[style] ~405-~405: The double modal “requires supervised” is nonstandard (only accepted in certain dialects). Consider “to be supervised”.
Context: ...oad means any implementation requires supervised restart, which is out of scope here. - ...

(NEEDS_FIXED)

🪛 markdownlint-cli2 (0.22.0)
docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md

[warning] 147-147: Link fragments should be valid

(MD051, link-fragments)


[warning] 275-275: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 311-311: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 344-344: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 377-377: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 412-412: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 439-439: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 464-464: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 599-599: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 612-612: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 618-618: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

docs/design/llama-stack-config-merge/poc-evidence/library-mode/README.md

[warning] 4-4: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🪛 YAMLlint (1.38.0)
src/data/default_run.yaml

[error] 155-155: too many blank lines (1 > 0)

(empty-lines)

🔇 Additional comments (6)
test.containerfile (1)

39-46: LGTM.

Ownership and copy additions align with the unified-mode entrypoint flow.

tests/unit/models/config/test_dump_configuration.py (1)

147-147: Assertion updates are consistent.

Adding "config": None to all five dumped-config expectations matches the new Optional config field on LlamaStackConfiguration.

docs/design/llama-stack-config-merge/poc-evidence/library-mode/query-response.json (1)

1-1: PoC evidence artifact — no action.

Per PR description, PoC evidence is expected to be removed prior to merge.

scripts/llama-stack-entrypoint.sh (1)

3-20: Comment/log update LGTM.

Header and log string accurately describe the two auto-detected modes; no behavior change.

tests/unit/models/config/test_llama_stack_configuration.py (1)

85-145: LGTM — validation coverage matches the new unified/legacy rules.

The tests cover the missing-config error, unified library mode, mutual exclusion, and URL normalization with unified config.

src/models/config.py (1)

750-787: LGTM — mutual-exclusion validation is enforced at model load.

The validator rejects unified+legacy mode, requires one config source in library mode, and preserves legacy path validation.

Comment thread docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md Outdated
Comment thread docs/design/llama-stack-config-merge/llama-stack-config-merge.md Outdated
Comment thread src/lightspeed_stack.py Outdated
Comment on lines +150 to +165
# --migrate-config runs standalone; does not load config into the singleton,
# since the input may be in legacy form and we are producing its successor.
if args.migrate_config:
# pylint: disable=import-outside-toplevel
from llama_stack_configuration import migrate_config_dumb

try:
migrate_config_dumb(args.run_yaml, args.config_file, args.migrate_output)
logger.info(
"Migration complete. Wrote unified config to %s",
args.migrate_output,
)
except Exception as e:
logger.error("Migration failed: %s", e)
raise SystemExit(1) from e
return
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Migration branch looks reasonable for a spike.

Deferred import avoids pulling synthesis code into the hot boot path, and broad except Exception is acceptable at the CLI boundary. One small nit: consider logger.exception(...) instead of logger.error("Migration failed: %s", e) so the traceback is captured — helpful when users report migration failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_stack.py` around lines 150 - 165, The except block for the
migrate_config_dumb call should log the full traceback; replace the
logger.error("Migration failed: %s", e) call with logger.exception("Migration
failed") so the stack trace is captured when args.migrate_config triggers
migrate_config_dumb (refer to migrate_config_dumb and the surrounding try/except
using logger).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good nit. Implementation JIRA scope — the migration tool's error-handling polish goes into "Migration tool — dumb-mode lift-and-shift" once the PoC code is replaced.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-svistunov, understood — makes sense to bundle that error-handling polish with the rest of the migration tool's production hardening rather than touching PoC code. Would you like me to open a GitHub issue to track the logger.exception() improvement as part of the "Migration tool — dumb-mode lift-and-shift" work?

Comment thread src/llama_stack_configuration.py Outdated
Comment on lines +651 to +683
def apply_high_level_inference(
ls_config: dict[str, Any], inference: dict[str, Any]
) -> None:
"""Apply a high-level `inference` block into `ls_config['providers']['inference']`.

Replaces the inference provider list entirely. Use `native_override` for
additive tweaks.

Parameters:
ls_config: Llama Stack config dict (modified in place).
inference: High-level inference section as a dict (with 'providers' list).
"""
providers_out: list[dict[str, Any]] = []
for provider in inference.get("providers", []):
p_type = provider["type"]
entry: dict[str, Any] = {
"provider_id": p_type,
"provider_type": PROVIDER_TYPE_MAP[p_type],
}
cfg: dict[str, Any] = {}
if provider.get("api_key_env"):
cfg["api_key"] = f"${{env.{provider['api_key_env']}}}"
if provider.get("allowed_models"):
cfg["allowed_models"] = provider["allowed_models"]
if provider.get("extra"):
cfg.update(provider["extra"])
if cfg:
entry["config"] = cfg
providers_out.append(entry)

if "providers" not in ls_config:
ls_config["providers"] = {}
ls_config["providers"]["inference"] = providers_out
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not use high-level provider type as the emitted provider_id.

For sentence_transformers, this emits provider_id: sentence_transformers, while the baseline and model/vector-store references use sentence-transformers. That leaves synthesized configs with references to a provider ID that no longer exists after providers.inference is replaced.

Normalize emitted provider IDs separately from provider types
 PROVIDER_TYPE_MAP: dict[str, str] = {
     "openai": "remote::openai",
     "sentence_transformers": "inline::sentence-transformers",
@@
     "vllm_rhel_ai": "remote::vllm",
 }
+
+PROVIDER_ID_MAP: dict[str, str] = {
+    "openai": "openai",
+    "sentence_transformers": "sentence-transformers",
+    "azure": "azure",
+    "vertexai": "vertexai",
+    "watsonx": "watsonx",
+    "vllm_rhaiis": "vllm-rhaiis",
+    "vllm_rhel_ai": "vllm-rhel-ai",
+}
@@
         p_type = provider["type"]
         entry: dict[str, Any] = {
-            "provider_id": p_type,
+            "provider_id": PROVIDER_ID_MAP[p_type],
             "provider_type": PROVIDER_TYPE_MAP[p_type],
         }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already a known issue — captured in the spike doc's "Findings discovered during PoC" section (the high-level inference naming collision). The implementation JIRA "Unified llama_stack.config schema + synthesizer" will fix it; Decision S5 (added in de734e7, updated in 8b85c4a) confirms the approach is for each backend-specific synthesizer to translate LCORE's canonical type vocabulary (sentence_transformers Literal) to the target backend's shape (hyphenated provider_id for Llama Stack today; model-string prefixes for Pydantic AI tomorrow).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-svistunov, thanks for the context — acknowledged. Since this is a spike and the PoC code will be removed before merge, and Decision S5 already captures the backend-specific translation approach (LCORE canonical sentence_transformers → hyphenated Llama Stack provider_id, with analogous handling for Pydantic AI), I'll leave this as a tracked divergence. No action needed here.


✏️ Learnings added
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:39.201Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread src/llama_stack_configuration.py Outdated
Comment on lines +825 to +829
ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir)
logger.info("Writing synthesized Llama Stack configuration to %s", output_file)
Path(output_file).parent.mkdir(parents=True, exist_ok=True)
with open(output_file, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Write synthesized configs with restrictive permissions.

native_override can contain literal secrets from a migrated run.yaml, so writing with default open(..., "w") can create a world-readable file depending on umask.

Use an explicit `0600` create mode
     ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir)
     logger.info("Writing synthesized Llama Stack configuration to %s", output_file)
     Path(output_file).parent.mkdir(parents=True, exist_ok=True)
-    with open(output_file, "w", encoding="utf-8") as f:
+    fd = os.open(output_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
+    with os.fdopen(fd, "w", encoding="utf-8") as f:
         yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir)
logger.info("Writing synthesized Llama Stack configuration to %s", output_file)
Path(output_file).parent.mkdir(parents=True, exist_ok=True)
with open(output_file, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
ls_config = synthesize_configuration(lcs_config, config_file_dir=config_file_dir)
logger.info("Writing synthesized Llama Stack configuration to %s", output_file)
Path(output_file).parent.mkdir(parents=True, exist_ok=True)
fd = os.open(output_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
with os.fdopen(fd, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llama_stack_configuration.py` around lines 825 - 829, The file write is
using open(output_file, "w") which can create world-readable files; change the
write to create the file with explicit restrictive permissions (mode 0o600).
After ensuring the parent directory is created (the existing
Path(output_file).parent.mkdir call is fine), open the file using a low-level
create with os.open(...) with flags O_WRONLY|O_CREAT|O_TRUNC and mode 0o600,
wrap the returned fd with os.fdopen to get a text file object, then use
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False) to write;
reference the variables/functions ls_config, output_file,
synthesize_configuration, and YamlDumper so you update the existing block that
writes the synthesized configuration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Landed in the spec doc just now — R10 mandates file mode 0600 via an explicit create flag (not umask), and Security considerations acknowledges that native_override and the dumb migration tool may carry literal secrets, so the file can't be assumed secret-free. See 6bc7a72. The PoC code is scheduled for removal pre-merge; the implementation JIRA inherits R10's requirement.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-svistunov, that's a clean resolution for a spike — capturing it as a hard spec requirement (R10: explicit O_CREAT | 0o600, no umask reliance) is exactly the right level of enforcement for implementation work rather than patching PoC code that's being deleted anyway. The acknowledgment in Security Considerations that both native_override and dumb-migration paths can carry literal secrets gives the implementer the necessary context to apply R10 correctly without assumptions. Happy to close this out on my end.

Comment thread tests/unit/test_client.py
Comment on lines 84 to 90
cfg.library_client_config_path = None
with pytest.raises(
ValueError,
match="Configuration problem: library_client_config_path is not set",
match="neither .*unified.* nor .*legacy.* is set",
):
client = AsyncLlamaStackClientHolder()
await client.load(cfg)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm that load() (or a helper it calls) re-runs validation raising the expected message.
rg -nP --type=py -C4 '\b(class\s+AsyncLlamaStackClientHolder|async\s+def\s+load)\b'
rg -nP --type=py -C2 'neither\s+.*unified.*\s+nor\s+.*legacy'

Repository: lightspeed-core/lightspeed-stack

Length of output: 1339


🏁 Script executed:

# Read the load() method and surrounding code
sed -n '40,100p' src/client.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 2701


🏁 Script executed:

# Read the complete test function context
grep -n "def test_" tests/unit/test_client.py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 448


🏁 Script executed:

# Find the specific test around line 84-90
sed -n '75,95p' tests/unit/test_client.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 830


🏁 Script executed:

# Search for LlamaStackConfiguration class and check_llama_stack_model validator
rg -nP --type=py 'class\s+LlamaStackConfiguration|def\s+check_llama_stack_model' -A 20

Repository: lightspeed-core/lightspeed-stack

Length of output: 3016


Consider testing validation at the model layer instead of relying on load() runtime checks.

The test passes, but not through Pydantic re-validation as suggested. The load() method does not re-run the check_llama_stack_model validator. Instead, the ValueError is raised by explicit runtime checks inside _load_library_client() (src/client.py:73-77). While this works today, the test would silently pass if those runtime checks are ever removed or refactored. A more robust test would construct the invalid config directly via LlamaStackConfiguration(...) inside pytest.raises to pin the expected validation behavior at the model layer where the check_llama_stack_model validator should enforce it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_client.py` around lines 84 - 90, Replace the current test
that relies on AsyncLlamaStackClientHolder().load(...) to trigger a runtime
ValueError with a direct validation test against the Pydantic model: construct
an invalid LlamaStackConfiguration(...) instance (setting
library_client_config_path=None and leaving both unified and legacy unset)
inside pytest.raises(ValueError, match="neither .*unified.* nor .*legacy.* is
set") so the check_llama_stack_model validator on the LlamaStackConfiguration
model is exercised; this ensures validation fails at model construction rather
than via AsyncLlamaStackClientHolder.load or _load_library_client runtime
checks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. PoC-side test; scheduled for removal pre-merge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-svistunov, understood — removing the PoC test alongside the rest of the PoC code is a clean resolution. No further action needed on this comment.

Comment thread tests/unit/test_llama_stack_synthesize.py Outdated
Copy link
Copy Markdown
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks ok as is. There's second approach: have abstracted inference+RAG sections on config file toplevel. It might allow us to ditch llama stack and replace it by XYZ library in future w/o the need to change config schema.

WDYT?

It will be a bit more problematic - basically it means that two config models will be used + some logic to translate between those models.

@max-svistunov
Copy link
Copy Markdown
Contributor Author

max-svistunov commented May 20, 2026

It looks ok as is. There's second approach: have abstracted inference+RAG sections on config file toplevel. It might allow us to ditch llama stack and replace it by XYZ library in future w/o the need to change config schema.

WDYT?

It will be a bit more problematic - basically it means that two config models will be used + some logic to translate between those models.

Incorporated (though not as a second approach, but as extension of the current proposal). PTAL at decision S5.

Things for you to PTAL at:

S5 specifically (the new decision):

  • S5 in full (L129-L216).
  • "Today vs Proposed" table (L144-L150): per the research, only inference lifts; rag / safety / vector_io stay under llama_stack.config.
  • "On the inference.providers[].type vocabulary" (L167-L174): LCORE keeps its canonical Literal vocabulary; each backend-specific synthesizer translates.
  • "Pydantic AI research findings" (L175-L207): per-concept researcher confidence numbers; full report at poc-results/pydantic-ai-research.md.

Also please see if you have any comments on:

  • S2 (L100-L105): deprecation calendar. Q3/Q4 warnings + end-Q4 removal. Feasible from engineering POV? Note the S5 implication: Pydantic AI V2 timing may end up defining the legacy-removal date.

@max-svistunov
Copy link
Copy Markdown
Contributor Author

@sbunciak Could you PTAL?

  1. S2 (L100-L105): deprecation calendar for the legacy path. Current recommendation: "deprecate fully by end of Q4; emit warnings during Q3 and Q4." Please confirm or override against the actual release calendar.
  2. S3 (L106-L112): downstream implications. @tisnik suggested expanding the ask to "Major + anybody from Ansible + RHDH" — your call on whom to ping.
  3. S5 (L129-L216): new decision. The LS→Pydantic AI migration is now on the calendar; S5 lifts backend-agnostic keys (inference.providers) to the top level of lightspeed-stack.yaml so downstream operators don't have to relearn the schema at cutover. Implication for S2: Pydantic AI V2 ("April 2026 at the earliest"; still unshipped as of 2026-05-20) is likely to ship inside the S2 deprecation window, which would mean the cutover effectively sets the legacy-removal date — not S2.
  4. S1 (L79-L99) + S4 (L113-L128): overall shape and out-of-scope items — read for context. @tisnik has already approved S1.

@max-svistunov
Copy link
Copy Markdown
Contributor Author

@sbunciak
Copy link
Copy Markdown
Contributor

S2 - deprecate in 0.6 (no breaking change, emit warnings), remove in 0.7. Release of 0.6 is tentatively planned for the end of June, 0.7 for the end of Sep.
S3 - yes, @major from RHEL LS as they are using LLS in server mode, and @elsony from RHDH as they are using library mode.

@major
Copy link
Copy Markdown
Contributor

major commented May 20, 2026

@sbunciak We're using lightspeed-stack/llama-stack in library mode now, but we can switch to server mode if needed or reorganize how we handle run.yaml.

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented May 20, 2026

@max-svistunov thanks for improving this PR, still LGTM on my side!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/design/llama-stack-config-merge/llama-stack-config-merge.md (1)

23-31: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align the spec’s config shape with Decision S5 before implementation.

This spec still defines inference under llama_stack.config, but the spike’s decided direction (S5, dated May 20, 2026) moves backend-agnostic keys to top-level (inference.providers). Keeping both shapes in authoritative docs will cause implementation and test drift.

Suggested doc update direction
-llama_stack:
-  config:
-    inference:
-      providers: ...
+inference:
+  providers: ...
+llama_stack:
+  config:
+    native_override: ...
+    profile: ...
+    baseline: ...

Also applies to: 58-60, 180-202

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md` around
lines 23 - 31, Update the spec to match Decision S5 by removing or deprecating
the nested inference key under llama_stack.config and moving backend-agnostic
keys to top-level inference.providers; specifically, replace references to
llama_stack.config.inference with top-level inference.providers, ensure
native_override, profile, and baseline remain available but documented as
top-level/compatible fields (or explicitly marked as deprecated if retained),
and update any examples and the sections referenced (lines ~58-60, 180-202) so
all schema references use inference.providers consistently with the S5 decision.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md`:
- Around line 8-16: Markdown fenced code blocks (e.g., the block containing the
Python snippet with Agent and agent.run_sync using
'anthropic:claude-sonnet-4-6') are missing blank lines before/after the fences
causing markdownlint MD031; edit the file to ensure there is exactly one blank
line before the opening ``` and one blank line after the closing ``` for each
fenced block (apply same fix to the other blocks referenced at lines ~26-31,
39-52, 72-84, 151-163) so the Agent snippet and all other fenced blocks conform
to MD031.
- Line 245: The file's final line (the paragraph starting "**Pydantic AI V2
timing.**") is missing a single trailing newline; update the markdown so the
document ends with exactly one newline character (ensure no extra blank lines)
to satisfy MD047 and markdownlint.

---

Outside diff comments:
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md`:
- Around line 23-31: Update the spec to match Decision S5 by removing or
deprecating the nested inference key under llama_stack.config and moving
backend-agnostic keys to top-level inference.providers; specifically, replace
references to llama_stack.config.inference with top-level inference.providers,
ensure native_override, profile, and baseline remain available but documented as
top-level/compatible fields (or explicitly marked as deprecated if retained),
and update any examples and the sections referenced (lines ~58-60, 180-202) so
all schema references use inference.providers consistently with the S5 decision.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d84dee53-3a9f-402a-a6ca-331ffd5a4513

📥 Commits

Reviewing files that changed from the base of the PR and between 6bc7a72 and 3d9334c.

📒 Files selected for processing (3)
  • docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md
  • docs/design/llama-stack-config-merge/llama-stack-config-merge.md
  • docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
🧰 Additional context used
🪛 LanguageTool
docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md

[style] ~33-~33: The verb “get” can be informal. Consider replacing it with a form of “to be”.
Context: ...ovider entries with type/id/config that get looked up by name at runtime**, the way Llama ...

(GET_USED_ARE_USED)

🪛 markdownlint-cli2 (0.22.1)
docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md

[warning] 8-8: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 15-15: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 26-26: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 51-51: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 83-83: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 151-151: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 245-245: Files should end with a single newline character

(MD047, single-trailing-newline)

- **Llama Stack source-file paths in the schema reconstruction could not be directly verified** because raw GitHub fetches were blocked during research. The schema sketch is reconstructed from third-party verbatim quotes (Cerebras, Red Hat, Medium) and PR titles. Before locking your schema, fetch `src/llama_stack/core/datatypes.py` (look for `StackRunConfig`) and a current `src/llama_stack/distributions/starter/run.yaml` to confirm.
- **`pydantic-ai-shields` and `pydantic-ai-guardrails` are third-party**, maintained by independent authors (vstorm-co and jagreehal respectively). They are not part of pydantic/pydantic-ai. Treat them as community packages whose APIs may diverge from anything the Pydantic team eventually ships.
- **No Pydantic AI public roadmap doc** was found; conclusions about "not on the roadmap" are inferred from the version-policy doc, the v1 launch post, recent release notes, and the absence of relevant milestones — not from a positive statement that these features are out of scope. Flagged as inference, not fact.
- **Pydantic AI V2 timing.** The version policy says *"V2 will be released in April 2026 at the earliest"*; as of May 20, 2026, V2 has not shipped and the changelog page lists only V1.x entries. The user's stated migration window (2026 or Q1 2027) likely overlaps the V2 release; plan to re-validate this report when V2 ships. No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a single trailing newline at EOF (MD047).

The file should end with exactly one newline to satisfy markdownlint and avoid formatting churn.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 245-245: Files should end with a single newline character

(MD047, single-trailing-newline)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/llama-stack-config-merge/poc-results/pydantic-ai-research.md` at
line 245, The file's final line (the paragraph starting "**Pydantic AI V2
timing.**") is missing a single trailing newline; update the markdown so the
document ends with exactly one newline character (ensure no extra blank lines)
to satisfy MD047 and markdownlint.

@max-svistunov
Copy link
Copy Markdown
Contributor Author

@tisnik -- one refinement from reconciling the spec with S5, plus a note.

Refinement: T1 detection now triggers on any synthesis input (a non-empty inference.providers or llama_stack.config), and the mutual-exclusion validator moves to the root Configuration model. Full matrix is in the spec's new Mode detection table; flagging since T1 is yours.

Note: S5's high-level inference will be on the existing top-level inference section (gains a providers list beside default_model/default_provider), not a new key.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/design/llama-stack-config-merge/llama-stack-config-merge.md (2)

65-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

R3 is still narrower than the updated synthesis-input contract.

R11 defines mutual exclusion against any synthesis input (inference.providers non-empty or llama_stack.config), but R3 still mentions only llama_stack.config + library_client_config_path. Please expand R3 to match R11 so requirements are internally consistent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md` around
lines 65 - 75, Update requirement R3 so it enforces mutual exclusion against any
synthesis input, not just llama_stack.config: when loading configuration, detect
if both llama_stack.library_client_config_path is set AND either
inference.providers is non-empty or llama_stack.config is present, and fail at
configuration load time with a clear error message that points to the
conflicting fields (inference.providers, llama_stack.config, and
llama_stack.library_client_config_path) and instructs the user to choose one
mode; reference R3 and R11 to ensure wording matches the broader
mutual-exclusion contract.

412-416: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix contradictory client-branching guidance in Implementation Suggestions.

Line 415 says to branch on config.config presence, but the design now defines unified mode by synthesis inputs (including top-level inference.providers). This should match the synthesis-input check to prevent implementing an outdated fork condition.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md` around
lines 412 - 416, Update the branching condition in _load_library_client to
detect unified mode the same way the synthesizer does (i.e., by checking for
synthesis inputs such as a top-level inference.providers or
UnifiedLlamaStackConfig fields) instead of testing config.config presence; when
the synthesizer-detection indicates unified mode call
_synthesize_library_config() and write the deterministic synthesized config
path, otherwise call the existing _enrich_library_config() for legacy flow —
ensure the check uses the same predicate used by
synthesize_configuration/deep_merge_list_replace (in
llama_stack_configuration.py) so the client and synthesizer share the exact
unified-vs-legacy decision logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md`:
- Around line 179-185: Update the document's "Backward compatibility scope"
table to use the new mode-detection rule (presence of any synthesis input)
instead of the old "`llama_stack.config` present" rule: change its condition to
check for non-empty inference.providers OR presence of llama_stack.config (or
generally "any synthesis input present"), and make the table's wording
consistent with the "Mode detection" table and Decision T1's expanded shape rule
so both tables express the same detection logic.

---

Outside diff comments:
In `@docs/design/llama-stack-config-merge/llama-stack-config-merge.md`:
- Around line 65-75: Update requirement R3 so it enforces mutual exclusion
against any synthesis input, not just llama_stack.config: when loading
configuration, detect if both llama_stack.library_client_config_path is set AND
either inference.providers is non-empty or llama_stack.config is present, and
fail at configuration load time with a clear error message that points to the
conflicting fields (inference.providers, llama_stack.config, and
llama_stack.library_client_config_path) and instructs the user to choose one
mode; reference R3 and R11 to ensure wording matches the broader
mutual-exclusion contract.
- Around line 412-416: Update the branching condition in _load_library_client to
detect unified mode the same way the synthesizer does (i.e., by checking for
synthesis inputs such as a top-level inference.providers or
UnifiedLlamaStackConfig fields) instead of testing config.config presence; when
the synthesizer-detection indicates unified mode call
_synthesize_library_config() and write the deterministic synthesized config
path, otherwise call the existing _enrich_library_config() for legacy flow —
ensure the check uses the same predicate used by
synthesize_configuration/deep_merge_list_replace (in
llama_stack_configuration.py) so the client and synthesizer share the exact
unified-vs-legacy decision logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e80b8187-62bf-4565-8063-e8c1434061b4

📥 Commits

Reviewing files that changed from the base of the PR and between 3d9334c and 9cf8351.

📒 Files selected for processing (2)
  • docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md
  • docs/design/llama-stack-config-merge/llama-stack-config-merge.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E Tests for Lightspeed Evaluation job

Comment on lines +179 to +185
**Mode-detection knock-on.** Because the `inference:` section always
exists (it carries defaults), unified mode is signalled by
`inference.providers` being **non-empty** — or by the presence of
`llama_stack.config` — not by the section merely existing. This expands
Decision T1's shape rule from "`llama_stack.config` present" to "any
*synthesis input* present"; see the spec doc's "Mode detection" table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align this doc’s own mode-detection table with the new “synthesis input” rule.

This section correctly expands detection to inference.providers (non-empty) OR llama_stack.config, but the later “Backward compatibility scope” table still uses the old "llama_stack.config present" rule. Please update that table to avoid conflicting implementation guidance in the same document.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/llama-stack-config-merge/llama-stack-config-merge-spike.md`
around lines 179 - 185, Update the document's "Backward compatibility scope"
table to use the new mode-detection rule (presence of any synthesis input)
instead of the old "`llama_stack.config` present" rule: change its condition to
check for non-empty inference.providers OR presence of llama_stack.config (or
generally "any synthesis input present"), and make the table's wording
consistent with the "Mode detection" table and Decision T1's expanded shape rule
so both tables express the same detection logic.

Add the spike doc (decisions up front, background below, 7 proposed
JIRAs) and the spec doc (requirements R1..R11, architecture,
implementation guide, migration worked example) under
`docs/design/llama-stack-config-merge/`.

Key decisions captured for reviewer confirmation:

- Overall shape: Option C (high-level + native_override) with Option E
  (profile feature, no shipped profiles) as an optional layer.
- Deprecation: calendar-based (e.g., "legacy path removed no sooner
  than 6 months after WARN begins"); concrete timing deferred to PM
  review.
- Override precedence: deep-merge with list replacement at leaf level.
- Secrets handling: env-var references preserved verbatim in
  synthesized files; never resolved to disk.
- Format detection: shape-based, with an optional
  `config_format_version` field that, if present, must agree with the
  shape.
- Migration tool shape: `--migrate-config` flag (no CLI refactor);
  dumb lift-and-shift mode only in v1; smart mode deferred.
- Profile distribution: feature only, LCORE ships no profiles of its
  own beyond reference examples under `examples/profiles/`.
- LS process supervision and hot-reload: out of scope (LCORE-777,
  LCORE-778, LCORE-781 territory).

The spike's PoC validated library-mode end-to-end: a
`lightspeed-stack.yaml` containing only `llama_stack.config` (no
external run.yaml) boots LCORE, serves /v1/query with a real model
response, and a `native_override` value demonstrably takes effect in
the synthesized run.yaml. Server-mode end-to-end through
docker-compose was skipped because the LS container image rebuild
(~2 GB, UBI + llama-stack llslibdev dependency sync) was impractical
for the spike timeline; the same synthesis code path is exercised by
the unit tests, including the lossless migrate-then-synthesize
round-trip.

PoC evidence is under `poc-evidence/library-mode/` as reference
material for reviewers, and per the spike howto it is intended to be
removed from the branch prior to merge. The spike doc and spec doc
remain permanent.
…entation)

Incorporates reviewer request: the work on this feature kicks off with a
Story that authors the behave `.feature` files for unified mode BEFORE
the feature is implemented. The intent is to keep test-shape authorship
free of implementation bias and to surface any architectural gaps early.

Adds two JIRAs to the spike doc's proposed-JIRAs list, bringing the
total from 7 to 9:

1. LCORE-???? (Story, inserted first) — E2E feature files for unified
   mode (no step implementation). Authors Gherkin scenarios against the
   spec doc's R1..R11 requirements. Explicitly forbids reading the
   implementation JIRAs or the synthesizer code while authoring.
   behave marks resulting steps as undefined; test-e2e still green
   (undefined scenarios are reported, not failed).

2. LCORE-???? (Task, inserted after the migrate-e2e-configs Story) —
   Implement behave step definitions for the kickoff feature files.
   Takes the Gherkin as-is (does not water down the tests to fit
   implementation). Blocked by the kickoff ticket plus the feature-
   implementation tickets (schema + synthesizer, migration tool, LS
   container entrypoint).

Filing both tickets together (rather than filing only the kickoff and
"letting the step-def ticket appear later") makes the dependency chain
explicit from the start and ensures the step-def work is not forgotten.

No other JIRAs change scope. The PR template is updated to reflect the
new count and to widen the "Full JIRA list" link range to cover both
new sections.
Per docs/contributing/howto-organize-poc-output.md (line 9), the convention
is `poc-results/`. The LCORE-836 PoC bundle was written under
`poc-evidence/` by mistake; rename to match the documented convention.

Also updates references in the spike doc and the PoC README.
Revisions arising from a re-read of the spike doc.  Spike doc:

- Add a "Design options A-E" explainer section near the top so readers
  encounter the option short names + one-line summaries before any
  decision references them.  Drop options D and F (previously dropped
  without scoring) entirely; renumber the survivors so letters are
  consecutive: old E (Profiles) -> new D, old G (Kustomize patches)
  -> new E.  References throughout the doc updated accordingly.
- Restructure Decision S1: one standalone option per row in the
  choice table, no more "B+C" / "C+E" composite labels.  Intro prose
  reduced to a link to the scoring section; recommendation spells out
  C (base shape) + D (optional profile layer).
- Replace Decision S2 with a single Q3/Q4 deprecation recommendation.
- Simplify Decision S3 to the "anything apart from Konflux?" framing.
- Decision T6: refer to profiles as "Option D layer" (was E).
- Decision T7: define "round-trip" in plain English on first use;
  recommend `default` as the field's default value; remove the
  reviewer-naming question.
- Decision T8: drop the Tekton-pipelines framing, Konflux only;
  name @radofuchs explicitly as the owner.
- Delete Decision T9 entirely (it was informational, not a decision).
  Its content is now an expanded bullet under "Findings discovered
  during PoC" covering the underlying library-client API constraint,
  three concrete implementation consequences, and a closing note on
  why a dict-only path isn't worth pursuing.
- PoC results section:
  - Drop the "Level 3'" reference (the other levels are never defined
    and the term carries no value here).
  - Rephrase the smart-migration bullet as deferred future work
    pointing at the spec doc's Open Questions.
  - Rewrite the provider_id naming-collision text in plain English.
  - Fix the evidence link block: was pointing at
    poc-evidence/library-mode/ (wrong directory after the earlier
    rename) and listed lightspeed-stack-unified-library.yaml as if
    it lived inside that subdir, whereas it actually sits one level
    up.  Each file now linked at its real location.
  - Rename "Surprise discovered during PoC" -> "Findings discovered
    during PoC".
  - Remove the stale "(Decision T9)" attribution.
- Rename "Current architecture (before LCORE-836)" ->
  "Current architecture" (the parenthetical added no information).
- "Design alternatives considered": add intro paragraph that
  explicitly frames the table as scoring; expand each of the 12
  attribute names from a short phrase to a full definition with
  high/low criteria; table columns renumbered A, B, C, D, E
  (was A, B+C, C+E, E, G); closing paragraph swaps "E layered on top
  of C" -> "D layered on top of C".
- Overview's recommendation paragraph: link target updated from
  Design options A-G to A-E; Option E -> Option D.

Spec doc:

- Realign the deprecation schedule with the new spike S2
  recommendation (Q3/Q4 warnings, end-of-Q4 removal).  Trim "subject
  to PM confirmation against the actual release calendar." -> "subject
  to PM confirmation."
- Rename the stale "see PoC surprise in the spike doc" reference to
  point at the new "Findings discovered during PoC" section.
…nchor

Spec doc:
- R6 reworded: it is true that LCORE itself does not resolve env refs
  on disk, but `native_override` and the dumb migration tool can carry
  literal secrets from a legacy `run.yaml`.  The previous wording made
  a stronger claim than the design can honour, so it has been narrowed
  to "secrets LCORE emits are not resolved on disk".
- R10 strengthened: the synthesized file must be created with mode
  0600 via an explicit create flag (not umask), so that when
  `native_override` or a migrated `run.yaml` does carry a literal
  secret, the file is not world-readable.
- "Security considerations" section rewritten to (a) drop the
  inaccurate "no secrets written to disk" blanket statement,
  (b) acknowledge that `native_override` and dumb-mode migration
  output may contain literal secrets, (c) mandate 0600 on the
  synthesized file, (d) recommend the migration tool's help / docs
  advise operators to replace literal secrets with `${env.<VAR>}`
  refs before or after migration, (e) note that LS's own
  `replace_env_vars()` (in `llama_stack.core.library_client`) is
  the in-memory resolver at LS startup.

Spike doc:
- Fix broken in-page anchor on line 166: the target heading is
  "### Merge semantics — worked examples", and GitHub renders that
  heading's anchor with a double hyphen
  (`#merge-semantics--worked-examples`) because the em-dash is
  stripped and the surrounding spaces both become hyphens.  Link
  display text now matches the heading exactly as well.
CodeRabbit flagged twelve fenced code blocks in the spike doc that
were missing blank lines before or after them (markdownlint MD031).
Applied via `markdownlint-cli2 --fix`.  No content change — purely
blank-line insertions around `\`\`\`text` and `\`\`\`yaml` fences in
the Proposed JIRAs section and Appendix B.

Line numbers of section headings (S1-T8, ## Proposed JIRAs) are
unchanged.  ## PoC results moved from L614 to L642 because the
auto-fix added blank lines inside the JIRAs section that precedes
it.
In response to @tisnik's review comment about future-proofing the
unified config schema against a backend swap, plus the project
direction to migrate from Llama Stack to Pydantic AI over time.

S5 extends S1 — it does NOT replace it.  Option C + optional D
remains the recommended overall shape.  S5 only decides *where* the
backend-agnostic high-level keys (`inference.providers` today;
later `rag.providers`) live in the operator-facing YAML hierarchy:
at the top level, not under the `llama_stack` subtree.  LS-specific
knobs (`native_override`, `profile`, `baseline`) stay under
`llama_stack.config` unchanged.

Confidence is 70%; the recommendation is provisional until a
research pass on Pydantic AI's actual config surface lands.  A
TODO is captured in the decision body so the spike author resolves
the per-provider `type:` vocabulary question post-research.

If S5 is adopted, the existing "Unified llama_stack.config schema +
synthesizer" implementation JIRA's scope shifts to ship the
top-level shape from day one.  No new JIRA is added.
Spike doc:

- Adds a new finding to "Findings discovered during PoC" calling
  out that the PoC's safety-shield validation was vacuous: the
  `native_override` registered `llama-guard` with
  `provider_shield_id: openai/gpt-4o-mini` — an OpenAI chat model,
  not a Llama Guard checkpoint, so the evidence row "`native_override`
  took effect" only proves the key landed, not that a real shield
  gated any query.  Caught by CodeRabbit on
  `poc-results/library-mode/synthesized-run.yaml:110`.  The
  implementation JIRA's e2e coverage must exercise a real Llama
  Guard model (e.g. `meta-llama/Llama-Guard-3-8B`).

- Updates Decision S5 with the Pydantic AI research findings from
  2026-05-20:
  - Drops the "future `rag.providers`" lift from the proposed table;
    research confirms Pydantic AI has no RAG / vector-store
    abstraction and no public roadmap signal one is coming in 6–12
    months.  RAG, safety/shields, vector storage all stay under
    `llama_stack.config`.
  - Adds a "Pydantic AI research findings" subsection citing the
    full report (now in `poc-results/pydantic-ai-research.md`)
    and the researcher's confidence numbers per concept: ~75% for
    `inference`, ~25% for RAG, ~20% for safety, ~60% for MCP.
  - Resolves the `inference.providers[].type` vocabulary TODO:
    keep LCORE's existing Literal vocabulary
    (`openai`, `azure`, `sentence_transformers`, …); each
    backend-specific synthesizer translates to its target shape.
  - Raises decision confidence from 70% to 75%; reservation is
    now research-bounded (Pydantic AI pre-V2 minor-surface churn,
    very low per their stated policy) rather than
    information-gap-bounded.
  - Adds a note that Pydantic AI V2 timing ("April 2026 at the
    earliest" — has not shipped as of 2026-05-20) likely overlaps
    LCORE's migration window; re-validate when V2 ships.

Adds the full research report under
`poc-results/pydantic-ai-research.md` (Llama Stack ↔ Pydantic AI
concept mapping, dated 2026-05-20) so the spike doc's link to it
resolves and so the evidence travels with the spike.
@sbunciak confirmed the two open strategic asks in PR review on
2026-05-20.  Recording the outcomes:

Spike doc:
- S2 (deprecation timeline): decided — warnings in 0.6 (no breaking
  change), legacy path removed in 0.7; tentative releases 0.6 end of
  June 2026, 0.7 end of September 2026.  Drops the "@sbunciak to
  confirm" trailing ask now that it is resolved.
- S3 (downstream implications): answered — the downstream consumers
  to account for are RHEL LS (@major) and RHDH (@elsony), both
  running LCORE in library mode.  RHEL LS is flexible on switching to
  server mode or reorganizing its run.yaml handling; RHDH pending
  confirmation from @elsony.  No design change required — library
  mode is already the primary path.

Spec doc (the permanent reference picks up only the S2 outcome, since
it is a real implementer-facing requirement; the S3 consumer names are
a decision-closure detail that lives in the spike):
- R2: corrected — the startup WARN ships in 0.6 (same release as
  unified mode), not "one release after unified mode lands"; legacy
  removed in 0.7.
- Deprecation schedule: replaced the Q3/Q4 placeholder with the
  confirmed 0.6 warnings / 0.7 removal plan and tentative dates.
Decision S5 moves the high-level inference section to the top level of
lightspeed-stack.yaml, but the spec doc still described it nested under
llama_stack.config (pre-S5). The schema JIRA points implementers at the
spec doc, so this had to be reconciled before filing.

Resolving S5 surfaced a name collision: the root `Configuration` model
already has a top-level `inference:` field (`InferenceConfiguration` —
`default_model` / `default_provider` for query-time routing), always
present via default_factory. Rather than add a competing key, S5's
provider list is added to that existing section as
`inference.providers`. Detection therefore keys on `inference.providers`
being non-empty (the section always exists) OR `llama_stack.config`
present — a "synthesis input" set rather than a single trigger key.

Spec doc:
- "What" / Key shape: inference is a top-level section (existing
  `inference:` extended with `providers`); native_override / profile /
  baseline stay under llama_stack.config.
- R1, R9, R11: unified shape = synthesis inputs (`inference.providers`
  and/or `llama_stack.config`); R9 documents extending
  InferenceConfiguration + the root-model validator.
- New "Mode detection" subsection: full combination matrix (unified /
  legacy / error / remote), url orthogonality, config_format_version
  soft-coupling.
- Pydantic block: drop UnifiedInferenceSection; show
  InferenceConfiguration gaining `providers`, UnifiedLlamaStackConfig
  reduced to backend-specific knobs, and the mutual-exclusion
  model_validator relocated to the root Configuration model.
- Synthesizer pipeline, _load_library_client fork, error handling,
  config-pattern note, and the key-files table all updated to the
  synthesis-input model.
- Open Questions: future rag/safety/storage sections stay under
  llama_stack.config per S5 + the Pydantic AI research.

Spike doc:
- S5: add the collision-resolution paragraph (lands in the existing
  inference section) and the mode-detection knock-on (synthesis-input
  signal, expands T1).
- T1: note the S5 knock-on to shape detection.
- Schema JIRA: extend InferenceConfiguration, drop UnifiedInferenceSection,
  root-model validator, synthesis-input detection.
- Appendix A (PoC files-changed): annotate that the implementation
  follows S5, not the PoC's UnifiedInferenceSection layout.
- E2E feature-files kickoff JIRA: widen the mutual-exclusion scenario
  from the old "llama_stack.config + library_client_config_path" to the
  synthesis-input model (a non-empty inference.providers OR a
  llama_stack.config block, each combined with
  library_client_config_path must fail). Also note top-level
  inference.providers as the unified boot path. Keeps the to-be-filed
  ticket precise post-S5.
- Decision S5: fix the research-report link — stale display text
  (compass_artifact_…) over an absolute URL replaced with a relative
  link to poc-results/pydantic-ai-research.md, matching the doc's other
  references. Surfaced by a consistency audit.
The feature-design process on upstream/main groups proposed JIRAs into an
Epic tree and expects a spec-doc "Acceptance test surface" section that
drives the e2e-kickoff feature files. Conform this PR's artifacts.

Proposed JIRAs — group into three Epics by aspect (the process names
Implementation / Docs / Tests as the grouping aspects for larger
features; this feature spans all three), ordered Implementation-first:
- Epic "Unified-config implementation": schema + synthesizer, migration
  tool, LS entrypoint + deployment, deprecation WARN.
- Epic "E2E and test-config coverage": e2e kickoff, e2e/integration
  config migration, step definitions.
- Epic "Documentation": docs migration, reference profiles.
Implementation leads by maintainer preference; this overrides the
config's e2e_kickoff_position=first default (the kickoff is still the
first child of the Tests Epic, and its no-dependency / author-before-impl
intent is preserved in the ticket and its blocking relationships). Each
Epic carries Goals/Scope prose (becomes the filed Epic description); the
nine tickets are demoted to #### H4 children. Verified with
file-jiras.sh --parse-only: epic_grouped, 3 Epics + 9 children, every
child's parent_epic_file routes to its Epic, types preserved, no [LINT-*].

Spec doc — add an "Acceptance test surface" section (R1..R11 -> observable
behavior -> verified-by), the documented source-of-truth that drives the
e2e-kickoff feature files. Point the kickoff JIRA's agentic instruction at
it and trim the kickoff's inline scenario list, which is now consolidated
in the table (removes the duplication that could drift from the
requirements). No decision or requirement changed — pure consolidation.
@max-svistunov max-svistunov force-pushed the lcore-836-spike-llama-stack-config-merge branch from 9cf8351 to 83e364a Compare May 26, 2026 08:22
The proposed-JIRA sections referenced spike-doc decisions by label
("Decision S5", "Decision S2", "Decision S4", "see Decision T8"). Those
read fine inside the spike doc but are meaningless in a standalone Jira
ticket once filed — a reader has no way to resolve "Decision S5".

Rephrase the references within the Proposed JIRAs section to be
self-contained (the decisions sections elsewhere in the doc keep their
Sx/Tx labels — only the to-be-filed ticket content is changed):

- Implementation Epic Goals/Scope: drop the "(Decision S5/S2/S4)"
  parentheticals; state the facts inline (both library and server modes;
  WARN in 0.6, removed in 0.7; out-of-scope items tracked under
  LCORE-777/778/781).
- Schema + synthesizer ticket: replace "Per Decision S5, ..." with a
  direct statement that `inference.providers` lives on the top-level
  `InferenceConfiguration` (backend-agnostic), pointing to the spec doc
  for the full design.
- LS entrypoint ticket: replace "coordinate with pipeline owner, see
  Decision T8" with "coordinate with the Konflux pipeline owner,
  @radofuchs".

Re-parsed with --parse-only: 3 Epics + 9 children, no [LINT-*], and no
"Decision SX/TX" labels remain in the generated tickets.
Each of the three proposed-JIRA Epic blocks now carries a "Spec doc:"
link to the spec doc's predicted permanent location on main
(github.com/lightspeed-core/lightspeed-stack/blob/main/...). So the
filed Epics give a Jira reader a clickable pointer to the full design.

The URL 404s until this spike PR merges to main; that's acceptable since
the Epics are acted on post-merge. The feature ticket (LCORE-836)
description gets the same URL later, at /spike-finalize, when the URL is
guaranteed live. In-repo cross-references and the child tickets' agentic
instructions deliberately stay as repo-relative paths (rot-proof, correct
for an agent working in a checkout).
JIRAs filed (Epics 2335/2340/2344, children 2336–2339, 2341–2343,
2345–2346, all under LCORE-836 via "Informs"). Replace every LCORE-????
placeholder in the Proposed JIRAs section with the real key:

- The nine `#### LCORE-????` ticket headings -> `#### LCORE-NNNN:` (the
  filed-key form file-jiras.sh recognises for re-sync).
- The `<!-- key: -->` metadata comments.
- Cross-references resolved by what they point at: kickoff Blocks /
  step-defs Blocked-by (-> 2341 kickoff, 2336 schema, 2337 migration,
  2338 entrypoint, 2343 step-defs).

No LCORE-???? placeholders remain. /spike-finalize will verify no orphans
post-merge.
@max-svistunov
Copy link
Copy Markdown
Contributor Author

@tisnik Could you please merge? Jira tickets are filed.

A pre-merge spike<->spec drift check found six JIRA agentic instructions
citing spec-doc sections by names that no longer exist — the spec
consolidated them. The content exists; only the pointers were stale:

- "Migration tool" / "Migration paths" / "Deprecation timeline"
  -> "Migration / backwards compatibility" (+ "Appendix A — Worked
  example" for the migration tool ticket).
- "Architecture -> Server mode" -> "Architecture" (Overview diagram +
  Trigger mechanism cover server mode; there is no Server mode subsection).
- "Profiles" -> "Appendix B — Reference profile example" + the
  "Configuration" sub-section.

Fixed the pointers in the spike doc (lower-risk than restructuring the
already-approved spec doc). All design facts (S5, detection, S2, schema,
migration semantics, acceptance coverage) were verified consistent — the
only drift was these section-name references.

The six affected tickets (LCORE-2337/2338/2339/2342/2345/2346) are
already filed and carry the stale pointers; they need a PUT re-sync from
this corrected spike doc (file-jiras.sh recognises the real keys in the
headings and updates rather than creates).
@tisnik tisnik merged commit 42844d0 into lightspeed-core:main May 26, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants